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Abstract 

We propose a novel algorithm for calculating multi-baryon correlation func- 
tions on the lattice. By considering the permutation of quarks (Wick contrac- 
tions) and color/spinor contractions simultaneously, we construct a unified 
index list for the contraction where the redundancies in the original contrac- 
tion are eliminated. We find that a significant reduction in the computational 
cost of correlators is achieved, e.g., by a factor of 192 for and ^He nuclei, 
and a factor of 20736 for the "^He nucleus, without assuming isospin symme- 
try. A further reduction is possible by exploiting isospin symmetry, and/or 
interchange symmetries associated with sink baryons, if such symmetries ex- 
ist. Extensions for systems with hyperons are presented as well. 

Keywords: Lattice QCD, Hadron-Hadron Interactions, Multi-Baryon 
Correlators, Contraction Algorithms 

1. Introduction 

Correlation functions of multi-baryon systems are the central quantities 
to be calculated when determining the properties and interactions of atomic 
nuclei directly from lattice QCD (+QED) simulations. The computational 
cost of constructing such correlators is, however, known to be exceptionally 
enormous for large mass number A, and one of the greatest challenges is to 
find an efficient algorithm for reducing it. The reason for such a high cost is 
that (i) the number of quark permutations (Wick contractions) grows factori- 
ally with A and (ii) the contraction of color/spinor degrees of freedom (DoF) 
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becomes exponentially large for large A. While there has been significant 
progress toward reducing this computational cost 

flaaaa 

, it continues 

to remain the most time-consuming part of the calculation, particularly for 
A>2E 



Lattice QCD simulations for multi-baryon systems date back to Refs. [13 



14| . where energies of two-nucleon (2N) systems in a box were extracted 



from temporal correlators in Euclidean space-time, and then related to 2N 
scattering lengths through the use of Liischer's formula 15|, Il6|^ IITH. Sim- 
ilar methods have been employed in recent studies as well 
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18 . In 



Refs. a a a^ ^ approach had been proposed, where nuclear forces 
were directly extracted from Nambu-Bethe-Salpeter (NBS) wave functions, 
or spacial correlators of 2N systems. This method was successfully extended 
to general hadron-hadron interactions such as hyperon-nucleon (YN) and 
hyperon-hyperon (YY) potentials [ll0, [21I, H, [isl, 0, Q |26|- A fur- 



ther extension was proposed in Ref. 
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where both spacial and temporal 



dependencies of correlators are utilized to extract non-local hadron-hadron 
potentials without requiring ground state saturation. 

Only quite recently, however, have lattice QCD studies for three- and 
higher-baryon systems been initiated: the NPLQCD Collaboration demon- 
strated a feasibility study for the energy of a system with S°S°n quantum 
numbers 10||; the PACS-CS Collaboration studied the energies of ^He and 
■^He at several lattice volumes and concluded that both ^He and ^He are 
bound states p|. In Ref. the HAL QCD Collaboration investigated three- 
nucleon forces (3NF) using the NBS wave function of three nucleons, and 
repulsive 3NF at short distance were found in the triton (^H) channel. One 
of the major obstacles in each of these studies was the computational cost of 
quark contractions, as discussed above. 

The purpose of this paper is to present a novel algorithm for the compu- 
tation of multi-baryon correlators. In particular, by considering the quark 
permutation and the color/spinor contractions simultaneously, we show that 
there exist large redundancies in the contributions to the correlator. By con- 



^ In this paper, we consider the computational cost of constructing multi-baryon corre- 
lators for a fixed ensemble size, neglecting the potential computational difficulty of achiev- 
ing an acceptable signal/noise ratio for such correlators at late times. This unrelated issue, 
known as the signal/noise problem, grows exponentially with both mass number and time 
separation of correlators [6|. For various attempts to ameliorate this problem, see, e.g., 

Refs. 0,i,i,|i3,[iil,[ii- 



2 



structing a unified index list of non-vanisliing contributions to tlie contraction 
witli tfiose redundancies eliminated, we can achieve a significant speedup for 
the computation of correlators. As will be described later, it is not necessary 
to assume a symmetry between different flavors (e.g., isospin symmetry) in 
this algorithm, although by doing so, an additional reduction in cost can be 
achieved. 

This paper is organized as follows. In Section |21 we describe the multi- 
baryon correlation functions under consideration and review the issues associ- 
ated with computing contractions. In Section [3l we propose a new algorithm 
which utilizes a unified index list for evaluating contractions. In Section HI 
the efficiency of the new algorithm is discussed, while Section E] is devoted 
to summary and concluding remarks. Further details of our results are tab- 



ulated in Appendix A 



2. Multi-baryon correlation functions 

We consider a 2yl-point multi-baryon correlation function with a mass 
number A, defined by 

^ (5„,(Xi)---fi.,(x^)s;,^(x;,)---s;,(xO), (i) 

where B^^ {B' , ) denotes an appropriate baryon interpolating field in the sink 

i 

(source) with a spinor index ai (a^), and coordinate index Xj = (tj, Xj) (X,'). 
We consider a general baryon operator given by 

5„(X) = eciC2C3(CTi)c,i,a2(r2)„,a3g(6)g(6)?(6), (2) 
B'AX') = e444(CT;),,,,,(r^),,,,g(eDg(e2)g(ei), (3) 

where q (c-) denotes color indices, and {^[) is a symbolic label for the 
collection of indices {xi, Ci,ai} with Xi being a quark coordinate index. Sum- 
mation over repeated indices is implied. C = 7472 is the charge conjugation 
matrix and Fj (F^) are appropriate 7-matrices. For instance, the choice of 
(Fi,F2) = {T[,T'2) = (75,1) is often employed for an octet baryon field. In 
the case of point sources and point sinks for quark fields, X = xi = X2 = X3 
and X' = Generalization to smeared quark fields in the sink 

and/or source is straight-forward; in such cases the coordinate indices Xi (x^ 
are replaced by associated smearing parameters. 



3 



2.1. Computation using a straightforward algorithm 

As described in SeclH the computational cost of a multi-baryon correlator 
diverges quickly for large A. One can estimate the number of contractions in 
Eq. ([T]) for a given {aj, a^, tj, t^} as follows. First, if we consider the 3-flavor 
space, the number of quark permutations (Wick contractions) amount to 
-^perm = NJ. ■ NJ ■ Ngl whcrc Nu, Nd and Ns are the number of up, down and 
strange quarks in the system, respectively. For instance, A^perm = 36 for ^H, 
2880 for ^H/^He and 518400 for ^He. Second, one must take into account the 
contractions for the color/spinor DoF. To carry out the counting, we exploit 
the sparse nature of 7-matrices and e-tensor: for each baryon in the sink or 
source, we attribute a factor of six to each color loop (i.e., sum over each 
color index), and a factor of four to each spinor loop (i.e., sum over each 
spinor index). The total cost of the color/spinor contractions therefore scale 
as A'loop = 62^-42^. Particularly, we find iVioop to be 0(10^) for ^H, 0(10^) for 
■^H/^He and O{10^^) for ^He. Third, we must repeat the above computation 
for all possible spacial variables at the sink, {Xi, ■ ■ ■ ,Xa}- For instance, 
when extracting the energy of the system, a zero-momentum projection for 
each baryon is commonly performed [4, 10, ll[ 13, l^, 18 1. When determining 
hadron-hadron potentials, NBS wave functions are extracted by imposing 
zero-momentum for the center of gravity, and the dependencies on relative 
coordinates between baryons are subject of interest [l|, ^ [s], [s], [Til, 19, 20, 21 



22, 23, 24, 25, 26 



by a factor of A^, 



vol 



In both cases, the computational cost will be multiplied 
= L^^, where L is the spacial extent of the latticeH 



2.2. Block algorithm 

Recently, algorithmic progress has been achieved for the computation of 
multi-baryon correlators jH, 111, 0, 0, [sf , by considering a block of three-quark 
propagators combined into a baryon sink. For simplicity, let us consider a 
274-point nucleon correlation function with A = 2, given by 

n„,;3;a',/3'(^i,^2; X(,X^) = {p^iX,)nf,{X2)n'f,,iX^)p'AX[)), (4) 



^ One typically does not count the computational cost associated with summing over 
spacial baryon coordinates, {X[, ■ ■ ■ ,^^}, at the source. Rather, a sum over quark co- 
ordinate indices is implicitly performed at the source by solving quark propagators with 
smearing sources. If one uses, e.g., all-to-all propagators, an additional factor of A'voi 
would arise, however. 
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where the proton and neutron fields are defined by 



u{^i)d{^2)u{^3), (5) 

We construct blocks of three-quark propagators defined by 
(CT?) 

and 

ai,oi2 l-*- 2 J/3, 03 

X [ 5,(6,^0^^(6,^3) -^<i(ei,e3)^^(e3,ei) ]^«(6,e2), (10) 

for all possible indices {X, ^j'}, where Sq{C,i, C,^) = {q{C,i)q{Cj)) denotes a quark 
propagator associated with the fiavor q = {u, d, s}. Using Eqs. and (ITOll . 
the correlation function can be written as 

na,/3; a',^'(Xi, X2; X[,X2) 

= 'Ca(2), 'Ca(3)) ■ /^(-^2; C(4) , C(5) 5 'Cct(6) ) 

xe4,,4(CT'/),.,„. (r;^)„, ■ ee^44(CTr),,,,, (r^")^, ■ sign(a),(ll) 

where = J2au So-d with (cr,) representing the permutation among 
up (down) quarks, and sign(cT) = sign (cr^) sign (cr,) representing a sign factor 
which arises from the anti-commuting property of fermions. 

There are several significant advantages to using Eq. fllip over the straight- 
forward approach. First, in terms of the permutation, we note that Eq. is 
antisymmetric under the exchange of two up quarks in the proton. A similar 
property holds for Eq. ffTOl) under the exchange of two down quarks in the 
neutron. Generally speaking, by exploiting these features, one can restrict 
the full permutation appearing in Eq. ffTTj) . which we refer to as cTfuu, to a 
sub-permutation cTgub, which excludes such exchanges, thus reducing Xperm 
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by a factor of 2^ [H, [s], Is], 0, [sj . Second, one finds that since the color /spinor 
contractions in the sink are performed prior to evaluating Eq. ( ITT]) . A^ioop is 
reduced from 6^"^ ■ to ■ 4A. Note that the computational cost of eval- 
uating Eqs. dH]) and f fTOj) is negligible, once up/down quark propagators are 
determined. Finally, this algorithm enables us to reduce the computational 
cost for the momentum projection. In fact, it is efficient to transform 
to momentum-space first, prior to the calculation of Eq. flTT]) . In this way, 
(e.g., in the case of the computation of NBS wave functions), one can per- 
form the zero-momentum projection onto the center of gravity utilizing the 
convolution technique. This reduces A^voi from L^"^ down to 0{L^^~^) jij. 
Furthermore, if one is interested in only the energy of the system using the 
correlator with each sink baryon projected onto zero-momentum (or any fixed 
momentum), iVvoi = £>(!), insensitive to A jj]. 

We note that an additional improvement has been carried out in Ref. 
in the isospin symmetric limit. By exploiting the permutation symmetry 
of protons and neutrons in the baryon interpolating field as well as other 
techniques, they achieved a significant reduction of A^perm, down to A^perm ~ 
93 for 3He (^H) and A^p.^^ = 1107 for ^He " 



3. Unified contraction algorithm 

We develop a new technique for evaluating contractions in correlation 
functions such as those defined in Eq. ([T]) by considering the permutation of 
quarks (Wick contractions) and the color/spinor contractions simultaneously. 
In doing so, we may eliminate redundancies as much as possible. Although 
the technique is rather general, and may be applied as an extension of either 
the straightforward algorithm or block algorithm, we focus our study on the 
latter case for simplicity. 

To demonstrate the idea, we again consider a 2A-point nucleon correla- 
tion function with A = 2, represented by block components given in Eq. fflT]) . 
In our algorithm, we evaluate Eq. f lTTj) under the condition that quarks of the 
same fiavor have the same space-time source point, or more generally, have 
the same space-time smearing function at the source. Under this condition, a 
permutation of quark operators in the source is equivalent to a permutation 
of color and spinor indices of the corresponding quark sources, and thus we 
can rewrite Eq. (fTTj) as 

= Faix^; e;, Q ■ mx^; e;, e,. o x c^juei, ■■■,q, m 
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where 



^q'/3'(^1' ■ ■ ■ ' ^e) 

= 5Z ^<{1)<(2)<(3)('^^1^)°'.(1)'<{2)(^2^)°'.<(3) 
<T-1 

xee' c' (CT'")«/ (ro")/?'^' ■sign(a), (13) 

and the sum is carried over the inverse permutations a~^. An essential feature 
of this result is that the computation of a permutation is absent in Eq. fll2p . 
To compensate for this, one must instead perform a permutation calculation 
to evaluate Eq. f|T3|) . Since the summand in Eq. f|T3l) is independent of the 
gauge field, however, this calculation need only be carried out once, and 
independently of any lattice simulation. As was the case for Eq. ffTTl) . the 
permutation sum in Eq. f[T^ may be taken over either the full permutation 
(cfuii) or over the sub-permutation (o"sub)- Generally, there is no difference 
between summing over cr^^ or a in Eq. (fT3|) in the former case, but in the 
latter case there is a difference, depending on the particular sub-permutation 
chosen. Note that in Eq. (fT3|) . although depends on the quark coordinate 
index at the source, because of the same-source condition imposed on the 
quark fields, this index is irrelevant in its evaluation. 

We note that the evaluation of Eq. f|T3|) amounts to preparing a unified 
index list for Wick and color/spinor loop contractions in which only non- 
zero components of the coefficient matrix C^p,{^[, ■ ■ ■ ,^g) are tabulated. In 
particular, if there exists any redundancy and/or cancellation among con- 
tributions in the original contraction, they are automatically consolidated 
when constructing the unified index list. Considering the sparse nature of 
7-matrices and e-tensors together, it is expected that the number of non- 
zero elements in the coefficient matrix is rather small, resulting in significant 
speedups in the computation of correlatorsjf] Note also that it is unnecessary 
to assume any symmetry between different flavors (e.g., isospin symmetry) 
in this algorithm, since only permutations among quarks of the same flavor 
are utilized. Various techniques to improve the signal in correlation functions 



have been investigated, including the use of all-to-all propagators [28[ and 



■^In evaluating multi-hadron correlation functions, correlation functions may suffer from 
round-off error due to a large number of cancellations among contributing terms [27| . By 



evaluating Eq. (|13|) . a subset of these cancellations are performed exactly using integer 
arithmetic, resulting in a reduction in round-off errors. 
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novel smearing methods such as distillation 29|. Generally, the application 



of our approach in such cases is straight-forward, although the degree of can- 
cellation achieved in evaluating the analog of Eq. f lT3|) could depend heavily 
on the details of the operator construction. 



4. EfRciency of the unified contraction algorithm 

In order to examine the efficiency of the unified contraction algorithm, 
we explicitly evaluate coefficient matrices for typical examples of interest. 
Particularly, we study systems composed of octet baryons. With regards to 
the explicit spinor structure of a baryon operator at the source, we consider 
two choices, 

(r;,r'2) = (75,1), (14) 

(r;,r'2) = (75Pnr,Pnr), P„. = (1 + 74)/2. (15) 

For each multi-baryon correlator, we employ either Eq. f|T4|) or f fTSj) for all 
baryon operators at the source, and do not mix the two choices. The choice 
of Eq. f lTSj) has been employed in recent multi-baryon studies 0, Isf, because 
only the upper half components of Dirac spinors survive (working in the Dirac 
basis for 7-matrices), thus reducing the number of spinor loop contractions 
by a factor of Hereafter, we refer to Eqs. (fT^ and (ITS!) as "standard" 
and "non-relativistic" operators, respectively. Note that the evaluation of 
Eq. (fT3|l does not depend on the spinor structure of a baryon operator at the 

sink, (ri,r2). 

When evaluating Eq. f ll3p . we may in principle consider two choices for 
the permutation, cxfuii and (Jsub, as discussed in Sec. 12.21 and Sec. [31 While 
(Tsub is trivially a better choice in the block algorithm, cXfuii offers an advan- 
tage in the unified contraction scheme. In particular, note that the coefficient 
matrix obtained from afuu solely depends on the structure of the source oper- 
ators, while, in the case of (Xsub; it also depends on the sink baryons implicitly 
through the definition of cXsub- Therefore, when one considers coupled chan- 
nel correlation functions where source baryons and sink baryons could be 
different, the contraction list obtained from cTfuii has broader utility. 



^ Note that although the operator has an apparent non-relativistic form, the states, 
which are dynamically generated on the lattice, are not subject to any non-relativistic 
approximation. 
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When sink baryons in the correlator are specified, one may exploit any 
existing inner-quark exchange symmetries at the sink to explicitly constrain 
the sum over indices in Eq. f|T2l) . and thus further reduce the computational 
cost of its evaluation. For example, if a baryon block is antisymmetric under 
the exchange of two indices ^- and C,j for i j, and the coefficient matrix is 
also antisymmetric under exchange of the same two indices (as guaranteed 
under the full permutation), then one may constrain the sums in Eq. f ll2p 
such that C,i < i'j- For instance, in the case of multi-nucleon systems, such 
considerations will result in a reduction of 2 in totalEl Note that this is the 
same reduction factor that is achieved by using asnh as opposed to CTfuu in the 
block algorithm. 

We carry out the construction of the unified contraction list using super- 
computers, since in the case of a mass number A > 2, the computational 
cost is found to be quite large depending on the operators chosen, naively 
growing factorially in each quark number (this exponential growth in compu- 
tational cost can be eliminated in some cases by exploiting Pauli exclusion, 
as will be discussed later on). It is, however, just a one-time investment, and 
we intend to make the lists publicly available for future use. We investigate 
the utility of our algorithm by considering two-octet baryon systems in the 
case of A = 2, and multi-nucleon systems, i.e., ^H/^He and "^He, for A = 3, 4. 
For simplicity, we consider single channel systems in this study, while ex- 
tension to coupled channel systems is straightforward. The computational 
cost of correlators using the unified contraction algorithm can be estimated 
by counting the number of non-zero elements, A^ust, in the coefficient matrix 
under the full permutation. The total number of terms in the contraction 
will then be given by A^^contr = Mist 72"^ after exploiting the inner-quark ex- 
change to explicitly constrain the sums in Eq. f lT^ . The values we obtain for 
A^iist and A^'contr are compiled in Appendix A together with A^perm and A^ioop 
obtained in the block algorithms. 

In order to make a comparison of methods easier, we present an effective 
number of permutations, defined by N^^^.^ = A'contr/A^ioop where Moop is taken 
from the corresponding block algorithm. The efficiency of our approach in 
comparison with the block algorithm (i.e., the speed-up factor) is given by 



^ One can apply a similar procedure for the unified contraction list with Csub, but the 
computational cost in the evaluation of Eq. (fT2|) is equivalent to the cost of the choice 
with (Tfuii, as is evident from the definition of Eq. (jl3p . 
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the ratio r] = A^perm/^pemi' "with a ratio larger than unity indicating an 
improvement. From the tables in Appendix A one immediately observes 
that the unified contraction algorithm yields better efficiency for all multi- 
baryon systems under consideration. What is particularly noteworthy is the 
gain for A = 3 and 4 with non-relativistic operators. In the case of ^H/^He, 
a factor of 192 improvement over the block algorithm is achieved, and in the 
case of ■*He, a factor of 20736 improvement is achieved. We have checked that 
these improvement factors are nearly realized in the actual lattice simulation 
code, as well. We also observe significant improvements in the case of A = 2. 
These improvements are useful for, e.g., coupled channel calculations for YN, 
YY interactions, where considerable computational cost would be required 



using the block algorithm [25 



It is in order that we remark on several aspects of the unified contrac- 
tion algorithm results presented in Appendix A First, a special property 
is observed for "^He correlators when non-relativistic operators are consid- 
ered. Specifically, one finds that A^ust = 518400 is exactly the same as Apcrm 
obtained in the block algorithm with cTfuu. This can be understood intu- 
itively, by noting that color/spinor DoF are completely saturated in ^He 
when quark sources are taken to be equal. This is a simple statement that, 
for every baryon spin component, the ^He analog of the coefficient matrix 
defined in Eq. (fT3|) is proportional to the product of two epsilon tensors in ^'^ 
(one for each flavor). Similar saturation is realized for '^Be, when we employ 
the operator which uses all four spinors of quarks. For larger A, such fermion 
saturation can be exploited to reduce the computational cost of evaluating 
the unified contraction list by noting that the coefficient matrix must be pro- 
portional to an epsilon tensor for a subset of indices corresponding to the 
fermions for which the color/spinor degrees of freedom are fully saturated. 

Second, although we tabulated A^contr for all possible (upper) baryon spin 
indices (a', f3',- ■ ■) at the source, it is not always necessary to calculate the 
correlator for all of them. For instance, the correlator for "^He with a' = P' 
or 7' = 6' should be trivially zero because of the ant i- commuting property 
of source baryons. It is interesting that the unified contraction algorithm 
exposes this feature explicitly, as is evident from the tables in Appendix A[ 
In the same way, the ''He correlator with, e.g., (a', f3', 7', 6') and (/3', a', 6', 7') 
should be same, so one can save computational time by calculating only one 
of them. Third, by imposing additional constraints on baryon sink opera- 
tors, further reduction in computational cost is possible. For instance, when 
we consider the correlation function with each sink baryon projected onto 
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zero-momentum, one may exploit any existing exchange symmetry among 
the same baryons in the sink [4]. This leads to a further reduction of the 
computational cost, e.g., by a factor of two and four for ^H/^He and ^He, 
respectively. Depending of the system of concern, one may also exploit addi- 
tional symmetries, if any, to skip the computation of redundant correlators. 
For example, if one assumes isospin symmetry, the computational cost of the 
above mentioned ^He correlator can be reduced further by about a factor of 
two. 

The proposed algorithm is rather general, and so it is possible to extend 
the technique to other systems of interest. An immediate application is 
to multi-hadron correlators including mesons, although for purely mesonic 
correlators, it remains to be determined whether the technique offers any 



advantage over the recursive approaches of Refs. [30, 131|- Furthermore, the 
method may be extended to various problems in quantum mechanics, where 
Slater determinants often appear as the subject of interest. If the coefficients 
of the Slater determinants have a sparse nature, a similar prescription may 
provide an efficient alternative. 

Finally, let us discuss the limitations of the unified contraction algorithm. 
In order to satisfy the same-source condition described in Sec. |3l the number 
of quarks associated with each flavor must be less than or equal to 12 due to 
Pauli exclusion. Therefore, the maximum mass number allowed is Amax = 8 
in 2-flavor space and A^ax = 12 in 3-flavor space, respectively. We note, 
however, that even for a system with A > Amax, the presented algorithm 
is expected to be efficient since it can reduce the computational cost corre- 
sponding to the subspace of permutations, which is spanned by imposing the 
same-source condition on as many quarks as possible. Further studies are 
currently underway. 

In the limit of large A, suppressing the computation of Wick contractions 
in Eq. ([1]) becomes most important, since A^perm grows factorially in quark 
number, whereas the others (A'loop, Avoi) grow exponentially with quark num- 



ber. In fact, such an algorithm was proposed in [32|, where the Wick contrac 



tions in Eq. ([T]) are expressed in terms of a determinant of quark propagators. 
It is then essential to realize that the computational cost of the determinant 
of an 77, X 77, matrix can be reduced from 0{n\) to 0{n^) by employing LU- 
decomposition. Unfortunately, this is not an efficient algorithm for light 
nuclei such as ^He, since A^ioop and/or Avoi remain overwhelmingly large as 
discussed in Sec. 12.11 However, as the mass number A grows, the determinant 
algorithm would presumably become a useful approach. 
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5. Summary 

We have proposed an efficient algorithm for calculating multi-baryon cor- 
relation functions on the lattice. By considering the permutation of quarks 
(Wick contractions) and the color/spinor contractions simultaneously, we 
have shown that there exist large redundancies in the original contraction. 
We have developed a method to construct a unified index list for the contrac- 
tion in which the redundancies are eliminated. It is noted that an assumption 
on the symmetry between different flavors (e.g., isospin symmetry) is not re- 
quired in this algorithm, although imposing such symmetries leads to further 
computational savings. Possible extensions of this algorithm have also been 
discussed. 

In order to determine how efficient the algorithm is, we have investigated 
several typical examples of interest, namely, two-octet baryon systems, ^H, 
^He and ^He. We have found that a significant speedup is achieved in all 
cases, in particular, by a factor of 192 for and '^He nuclei and a factor 
of 20736 for the ^He nucleus. For typical correlators of concern, where each 
nucleon is projected onto zero-momentum, further speedup can be achieved, 
e.g., by a factor of 2 and 4 for ^H/^He and ^He, respectively. This achievement 
takes a significant step towards the ultimate objective of studying nuclear 
physics from first principles lattice simulations of QCD (-|- QED). 
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Appendix A. Computational cost of the unified contraction algo- 
rithm 

Here, we show the computational cost of the unified contraction algo- 
rithm for various multi-baryon correlators. In particular, we tabulate the 
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number of non-zero entries (A^ust) appearing in the coefficient matrix, the 
total number of contractions (Ncontr), and the effective permutation number 
(^pemi = ^contr/A^ioop)- As discusscd in Sec. [3], the number of contractions 
required to compute the correlation function is given by iVcontr = -^Ust/S"^ in 
the case of nucleons. For comparison, we also provide A^perm and A^ioop for the 
block algorithm. The increase in efficiency achieved by the unified contrac- 
tion algorithm over the block algorithm is given by the ratio r] = A^pcrm/A^pfrm) 
where A^perm is the number of permutations required in the block algorithm 
using the optimal choice, cTsub- 

Tables are provided for single channel two-octet baryon systems, ^H/^He 
and ^He. For each baryon in the system, only the upper spinor components 
are considered, i.e., a'^ = (0,1) for z = 1,...,A, since each baryon field 
is expected to couple strongly to a corresponding positive parity baryon. 
Because the system is symmetric under the flip of all spin indices, we only 
give results for a'l = 0. For baryon interpolating fields in the source, we 
consider both standard and non-relativistic operators, as described in Sec. HI 

Appendix A.l. Two-octet baryon systems 

We consider single channel systems of two-octet baryons without assum- 
ing flavor symmetry. The contraction list falls into three classes based on the 
flavor content of baryons: (i) pp, nn, E"^S+, S^S^, S^S^ systems, (ii) 

pn, S+S°, S~S~ systems and (iii) nS^, S'^S^ systems. The correlation 
function under consideration is given by Eq. (jl]), or an analog of it. 
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Table A.l: pp, nn, S+E+, S S , 2"^°, s S systems with the standard operators. 



source spin 


block algorithm 


unified contraction algorithm 


cfiicicncy 




-Moop A^perm (<^full) -^perm (""sub) 


-A^list A^contr -^perm 


V 


(0, 0) 
(0, 1) 


576 48 12 
576 48 12 



11088 2772 4.8 


2.5 


Table A. 2: Same as above, but with the non-relativistic operators. 


source spin 


block algorithm 


unified contraction algorithm 


efficiency 




Aloop Aperm (cffull) -^perm (o^sub) 


-Mist Mcontr -^perm 


V 


(0, 0) 
(0, 1) 


144 48 12 
144 48 12 



1008 252 1.8 


6.9 



Table A. 3: pn, S+s°, E S systems with the standard operators. 



source spin 


block algorithm 


unified contraction algorithm 


efficiency 




-Moop -A'pcrm (""full) -A^perm ('''sub) 


-Mist -^contr -^perm 




(0, 0) 
(0, 1) 


576 36 9 
576 36 9 


8316 2079 3.6 
9432 2358 4.1 


2.5 
2.2 


Table A. 4: Same as above, but with the non-relativistic operators. 


source spin 


block algorithm 


unified contraction algorithm 


efficiency 


(a', P') 


-Moop A^perm ('''full) -^perm ("sub) 


-Mist -M;ontr -^perm 




(0, 0) 
(0, 1) 


144 36 9 
144 36 9 


756 189 1.3 
1008 252 1.8 


6.9 
5.1 
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Table A. 5: pS"'", nS , E^S systems with the standard operators. 



source spin 


block algorithm 


unified contraction algorithm 


cfiicicncy 




-^loop -^perm (<^full) -^perm (""sub) 


-^list -^contr "^perm 


V 


(0, 0) 
(0, 1) 


576 24 6 
576 24 6 


5400 1350 2.3 
7776 1944 3.4 


2.6 
1.8 


Table A. 6: Same as above, but with the non-relativistic operators. 


source spin 


block algorithm 


unified contraction algorithm 


efficiency 




^loop ^perm (cffull) -^perm (o^sub) 


-^list -^contr -^perm 


V 


(0, 0) 
(0, 1) 


144 24 6 
144 24 6 


648 162 1.1 
864 216 1.5 


5.3 
4.0 



Appendix A. 2. Three- octet baryon systems 

We consider the correlation function for given by 

na/3-y;a'/3'7' = {PanpUj n'^,n'p,p'^,) . (A.l) 

^He, and the hyperon analogs S+S^S*^, and S^S^S~ 

share the same contraction list. 
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Table A. 7: ■^H, ^He systems with the standard operators. 


source spin 


block algorithm 


unified contraction algorithm 


efficiency 


(a',/3',y) 


-^loop -^perm (ffull) -^perm (i^sub) 


-^list -^contr "^perm 


V 


(0, 0, 0) 
(0, 0, 1) 
(0, 1, 0) 
(0, 1, 1) 


13824 2880 360 
13824 2880 360 

13824 2880 360 
13824 2880 360 



3775680 471960 34.1 
3775680 471960 34.1 




10.5 
10.5 


Table A. 8: Same as above, but with the non-relativistic operators. 


source spin 


block algorithm 


unified contraction algorithm 


efficiency 


(«',/?', V) 


^loop -^pcrm {o"full) -/Vperm (o-sub) 


-^list -^contr -^perm 


V 


(0, 0, 0) 
(0, 0, 1) 
(0, 1, 0) 
(0, 1, 1) 


1728 2880 360 
1728 2880 360 
1728 2880 360 
1728 2880 360 



25920 3240 1.9 
25920 3240 1.9 




192 
192 



Appendix A. 3. Four-octet baryon systems 

We consider the correlation function of ^He given by 

na/375;a'/3'7'<5' = (PaPf^n^ns n'g,fi'yp'0,p'^,) . (A.2) 

The hyperon analogs E+E+S^S'^ and E~E~S~S~ share the same contraction 
list. 
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Table A. 9: '^He system with the standard operators. 



source spin 


block algorithm 


unified contraction algorithm 


efficiency 


(a',/3',y,'5') 


-^loop 


-^perm (o"full) 


-^perm (''"sub) 




iVlist 




-^contr 


jyett 
perm 


V 


(0, 0, 0, 0) 


331776 


518400 


32400 



















(0, 0, 0, 1) 


331776 


518400 


32400 



















(0, 0, 1, 0) 


331776 


518400 


32400 



















(0, 0, 1, 1) 


331776 


518400 


32400 



















(0, 1, 0, 0) 


331776 


518400 


32400 



















(0, 1, 0, 1) 


331776 


518400 


32400 




1407974400 ^ 


^7998400 


265.2 


122.2 


(0, 1, 1, 0) 


331776 


518400 


32400 




1407974400 t 


^7998400 


265.2 


122.2 


(0, 1, 1, 1) 


331776 


518400 


32400 



















Table A. 10: Same as above, but with the non- 


r 


elativistic operators. 








source spin 


block algorithm 


unified contraction algorithm 


efficiency 


(a',/3',y,5') 


-^loop 


-^pcrm (o"full) 


-^perm (""sub) 




A^iist 


-^contr 


perm 




V 


(0, 0, 0, 0) 


20736 


518400 


32400 



















(0, 0, 0, 1) 


20736 


518400 


32400 



















(0, 0, 1, 0) 


20736 


518400 


32400 



















(0, 0, 1, 1) 


20736 


518400 


32400 



















(0, 1, 0, 0) 


20736 


518400 


32400 



















(0, 1, 0, 1) 


20736 


518400 


32400 




518400 


32400 


1.6 






20736 


(0, 1, 1, 0) 


20736 


518400 


32400 




518400 


32400 


1.6 






20736 


(0, 1, 1, 1) 


20736 


518400 


32400 
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